Google is testing a new audio codec, called Lyra, which dramatically improves audio quality over poor network connections. (Google)

Google Duo is testing a new codec to give you better call quality even on bad connections

Adriana MaraisMarch 2, 21 Google AI, Google Duo, Lyra

The world may be bracing for 5G, but the reality is a vast majority of people are still content with slow data speeds and poor connectivity. And to help solve this problem, Google Duo uses compression techniques to provide the best possible audio and video experience over bad / spotty connections.

Google is testing a new audio codec that dramatically improves audio quality over poor network connections. The Google AI team featured Lyra, a low bit rate voice codec, in a detailed blog post. Lyra’s basic architecture is to “extract distinctive speech attributes (characteristics) in the form of log mel spectograms”. These are then compressed and transmitted over the network and then recreated at the other end using a generative model.

So far, this is also what traditional parametric codecs do. However, Lyra uses a new model of high quality audio generation which can extract critical parameters from speech and can also reconstruct speech using minimal amounts of data.

The new generative model used in Lyra is based on Google’s older work on WaveNetEQ, a “generative model-based packet loss concealment system” currently in use on Google Duo.

Google explained that this approach has made Lyra “on par with the advanced waveform codecs” which are used in many streaming and communications platforms. The advantage of Lyra over these other codecs, as claimed by Google, is that Lyra does not send the signal sample by sample, which requires a higher bit rate and therefore more data.

Lyra uses a “cheaper recurrent generative model” which operates at a lower frequency but generates multiple signals at different frequencies in parallel which are later combined into “a single output signal at the desired sample rate”.

Running a generative model like this on a midrange device “produces 90ms processing latency” and Google says this is consistent with other traditional voice codecs.

This, together with the AV1 codec for video, can also make video chats possible for users using an “old 56kbps dial-up modem”. Google explained that Lyra is designed to operate in severely limited bandwidth environments, such as 3 kbps.

Google also added that Lyra can outperform codecs like Speex, MELP, and AMR at very low bitrates as well as open source royalty-free codecs like Opus. Google is sharing some sample speeches on the blog, you can check it out here.

Lyra is being trained “with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries, then verifying audio quality with expert and crowdsourced listeners,” Google said . And the new codec is already being rolled out on Google Duo. Lyra is currently being used for speech use cases, but Google is also exploring how to use it as a general-purpose audio codec.